[Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image#2339
Merged
wtomin merged 9 commits intovllm-project:mainfrom Apr 8, 2026
Merged
Conversation
1 task
d9acf3b to
0e22230
Compare
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
0e22230 to
e184a17
Compare
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Contributor
Author
Text to image examplestabilityai/stable-diffusion-3.5-mediumpython examples/offline_inference/text_to_image/text_to_image.py \
--model stabilityai/stable-diffusion-3.5-medium \
--prompt "A serene mountain landscape at sunset" \
--negative-prompt "blurry, low quality, distorted" \
--guidance-scale 4.5 \
--num-inference-steps 28 \
--height 1024 \
--width 1024 \
--seed 42 \
--output output_sd3_layerwise.png \
--enable-layerwise-offload
python examples/offline_inference/text_to_image/text_to_image.py \
--model stabilityai/stable-diffusion-3.5-medium \
--prompt "A serene mountain landscape at sunset" \
--negative-prompt "blurry, low quality, distorted" \
--guidance-scale 4.5 \
--num-inference-steps 28 \
--height 1024 \
--width 1024 \
--seed 42 \
--output output_sd3.pngstepfun-ai/NextStep-1.1python examples/offline_inference/text_to_image/text_to_image.py \
--model stepfun-ai/NextStep-1.1 \
--prompt "A baby panda wearing an Iron Man mask, holding a board with 'NextStep-1' written on it" \
--height 512 \
--width 512 \
--num-inference-steps 28 \
--guidance-scale 7.5 \
--guidance-scale-2 1.0 \
--cfg-schedule constant \
--seed 42 \
--output output_nextstep_layerwise.png \
--enable-layerwise-offload \
--init-timeout 1200 \
--stage-init-timeout 1200
python examples/offline_inference/text_to_image/text_to_image.py \
--model stepfun-ai/NextStep-1.1 \
--prompt "A baby panda wearing an Iron Man mask, holding a board with 'NextStep-1' written on it" \
--height 512 \
--width 512 \
--num-inference-steps 28 \
--guidance-scale 7.5 \
--guidance-scale-2 1.0 \
--cfg-schedule constant \
--seed 42 \
--output output_nextstep.pngAIDC-AI/Ovis-Image-7Bpython examples/offline_inference/text_to_image/text_to_image.py \
--model AIDC-AI/Ovis-Image-7B \
--prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \
--height 1024 \
--width 1024 \
--num-inference-steps 50 \
--guidance-scale 5.0 \
--cfg-schedule constant \
--seed 42 \
--output output_ovis_image_layerwise.png \
--enable-layerwise-offload
python examples/offline_inference/text_to_image/text_to_image.py \
--model AIDC-AI/Ovis-Image-7B \
--prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \
--height 1024 \
--width 1024 \
--num-inference-steps 50 \
--guidance-scale 5.0 \
--cfg-schedule constant \
--seed 42 \
--output output_ovis_image.pngmeituan-longcat/LongCat-Imagepython examples/offline_inference/text_to_image/text_to_image.py \
--model meituan-longcat/LongCat-Image \
--prompt "一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。" \
--height 768 \
--width 1344 \
--num-inference-steps 50 \
--guidance-scale 4.0 \
--seed 42 \
--output output_longcat.png
python examples/offline_inference/text_to_image/text_to_image.py \
--model meituan-longcat/LongCat-Image \
--prompt "一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。" \
--height 768 \
--width 1344 \
--num-inference-steps 50 \
--guidance-scale 4.0 \
--seed 42 \
--output output_longcat_layerwise.png \
--enable-layerwise-offload |
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
5 tasks
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Contributor
Author
gcanlin
approved these changes
Apr 7, 2026
Collaborator
gcanlin
left a comment
There was a problem hiding this comment.
LGTM, please fix conflicts :)
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Merged
5 tasks
vraiti
pushed a commit
to vraiti/vllm-omni
that referenced
this pull request
Apr 9, 2026
…p_1, LongCat-Image (vllm-project#2339) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
bob-021206
pushed a commit
to jasonlee-1024/vllm-omni
that referenced
this pull request
Apr 21, 2026
…p_1, LongCat-Image (vllm-project#2339) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: bob-021206 <binyan_github@163.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR aims at supporting and validating layerwise CPU offloading for more diffusion models (or one of components of omni models).
Most of the work is about testing and verifying these models work with the feature. If there exist out-of-scope issues or special handling for specific models, we might resolve in another PR later.
Planing models and supported in this PR:
stabilityai/stable-diffusion-3.5-mediumAIDC-AI/Ovis-Image-7Bstepfun-ai/NextStep-1.1meituan-longcat/LongCat-ImagePlanning but not enabled in this PR:
ValueError: Tokenizer class MammothUTokenizer does not exist or is not currently imported.Test Plan
Offline generations, refer to subsequent comments for detailed testing commands
#2339 (comment)
Test Result
Stats
*Tested on H100, single device
*Peak memory recording from
DiffusionModelRunner._record_peak_memory*Strongly not recommended to enable layerwise offloading on
stepfun-ai/NextStep-1.1, as it's an AR with Diffusion heads model which runs multiple denoising steps for each of token generated (quite a lot of offloading happens)*The total generation time increased for the above profiling when enabling the feature, I'm suspecting that for image gen, compute goes faster. This happened before on Qwen-Image image gen tasks: #858 (comment) ; We might want further profiling on specific devices.
Generated image comparison
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)